{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sayakpaul/ml-deployment-k8s-fastapi/blob/main/notebooks/TF_to_ONNX.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ahaLOgxyzACW"
      },
      "source": [
        "# Convert `tf.keras` model to ONNX\n",
        "\n",
        "This tutorial shows:\n",
        "- how to convert tf.keras model to ONNX from the saved model file or the source code directly. \n",
        "- comparison of the execution time of the inference on CPU between tf.keras model and ONNX converted model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CmnzNRTkzaYq"
      },
      "source": [
        "## Install ONNX dependencies\n",
        "- `tf2onnx` provides a tool to convert TensorFlow model to ONNX\n",
        "- `onnxruntime` is used to run inference on a saved ONNX model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Y7VIFntKUh0R"
      },
      "outputs": [],
      "source": [
        "!pip install -Uqq tf2onnx\n",
        "!pip install -Uqq onnxruntime"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D7TJluNyz8k0"
      },
      "source": [
        "### Imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-UfszPPVf9P0"
      },
      "outputs": [],
      "source": [
        "import tf2onnx\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "import numpy as np"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_eo3f1Zn0S3F"
      },
      "source": [
        "### Get a sample model "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3R81akF_hDEL"
      },
      "outputs": [],
      "source": [
        "core = tf.keras.applications.ResNet50(include_top=True, input_shape=(224, 224, 3))\n",
        "\n",
        "inputs = tf.keras.layers.Input(shape=(224, 224, 3), name=\"image_input\")\n",
        "preprocess = tf.keras.applications.resnet50.preprocess_input(inputs)\n",
        "outputs = core(preprocess, training=False)\n",
        "model = tf.keras.Model(inputs=[inputs], outputs=[outputs])"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Note that we are including the preprocessing layer in the `model` object. This will allow us to load an image from disk and run the model directly without requiring any\n",
        "model-specific preprocessing. This reduces training/serving skew. "
      ],
      "metadata": {
        "id": "W3smmoIBCFOX"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MQg5cN910Z6q"
      },
      "source": [
        "## Convert to ONNX"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3-friv_fMk79"
      },
      "outputs": [],
      "source": [
        "num_layers = len(model.layers)\n",
        "print(f'first layer name: {model.layers[0].name}')\n",
        "print(f'last layer name: {model.layers[num_layers-1].name}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UBGQxHHz0dGP"
      },
      "source": [
        "### Conversion\n",
        "\n",
        "`opset` in `tf2onnx.convert.from_keras` is the ONNX Op version. You can find the full list which TensorFlow (TF) Ops are convertible to ONNX Ops [here](https://github.com/onnx/tensorflow-onnx/blob/master/support_status.md).\n",
        "\n",
        "There are two ways to convert TensorFlow model to ONNX:\n",
        "- `tf2onnx.convert.from_keras` to convert programatically\n",
        "- `tf2onnx.convert` CLI to convert a saved TensorFlow model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "_MAEoy9j0QRQ"
      },
      "outputs": [],
      "source": [
        "import onnx\n",
        "\n",
        "input_signature = [tf.TensorSpec([None, 224, 224, 3], tf.float32, name='image_input')]\n",
        "onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature, opset=15)\n",
        "onnx.save(onnx_model, \"resnet50_w_preprocessing.onnx\")\n",
        "\n",
        "# model.save('my_model')\n",
        "# !python -m tf2onnx.convert --saved-model my_model --output my_model.onnx"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V2-aNpahQMVR"
      },
      "source": [
        "## Test TF vs ONNX model with dummy data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Zt5lsQoUQXOo"
      },
      "source": [
        "### Generate dummy data "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ceqZH2KbPznx"
      },
      "outputs": [],
      "source": [
        "dummy_inputs = tf.random.normal((32, 224, 224, 3))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "M8DR47zeQZHI"
      },
      "source": [
        "### Test original TF model with dummy data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zL8Lw9H8QbT7"
      },
      "outputs": [],
      "source": [
        "%%timeit\n",
        "model.predict(dummy_inputs)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "smFa5VWjTNLb"
      },
      "outputs": [],
      "source": [
        "tf_preds = model.predict(dummy_inputs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Lqhi458k0fkM"
      },
      "source": [
        "### Test converted ONNX model with dummy data\n",
        "\n",
        "If you want to inference with GPU, then you can do so by setting `providers=[\"CUDAExecutionProvider\"]` in `ort.InferenceSession`.\n",
        "\n",
        "The first parameter in `sess.run` is set to `None`, and that means all the outputs of the model will be retrieved. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1ELVBwrn0-Cf"
      },
      "outputs": [],
      "source": [
        "import onnxruntime as ort\n",
        "import numpy as np\n",
        "\n",
        "sess = ort.InferenceSession(\"resnet50_w_preprocessing.onnx\") # providers=[\"CUDAExecutionProvider\"])\n",
        "np_dummy_inputs = dummy_inputs.numpy()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jszhyR15SJaE"
      },
      "outputs": [],
      "source": [
        "%%timeit \n",
        "sess.run(None, {\"image_input\": np_dummy_inputs})"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ax6opk4ENmlK"
      },
      "outputs": [],
      "source": [
        "ort_preds = sess.run(None, {\"image_input\": np_dummy_inputs})"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Check if the TF and ONNX outputs match"
      ],
      "metadata": {
        "id": "jbrwQMDbBLps"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "np.testing.assert_allclose(tf_preds, ort_preds[0], atol=1e-4)"
      ],
      "metadata": {
        "id": "um99Uu4FBPrY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QPu6kdNnU8Y6"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "We did a simple experiments with dummy dataset of 32 batch size. The default behaviour of `timeit` is to measure the average of the cell execution time with 7 times of repeat ([`timeit`'s default behaviour](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)).\n",
        "\n",
        "\n",
        "The ONNX model will likely always have a better inference latency than the TF model if you are using a CPU server for inference."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "TF to ONNX.ipynb",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}